Age-dependent predictors of effective reinforcement motor learning across childhood

Across development, children must learn motor skills such as eating with a spoon and drawing with a crayon. Reinforcement learning, driven by success and failure, is fundamental to such sensorimotor learning. It typically requires a child to explore movement options along a continuum (grip location on a crayon) and learn from probabilistic rewards (whether the crayon draws or breaks). Here, we studied the development of reinforcement motor learning using online motor tasks to engage children aged 3 to 17 and adults (cross-sectional sample, N=385). Participants moved a cartoon penguin across a scene and were rewarded (animated cartoon clip) based on their final movement position. Learning followed a clear developmental trajectory when participants could choose to move anywhere along a continuum and the reward probability depended on final movement position. Learning was incomplete or absent in 3 to 8-year-olds and gradually improved to adult-like levels by adolescence. A reinforcement learning model fit to each participant identified three age-dependent factors underlying improvement: amount of exploration after a failed movement, learning rate, and level of motor noise. We predicted, and confirmed, that switching to discrete targets and deterministic reward would improve 3 to 8-year-olds’ learning to adult-like levels by increasing exploration after failed movements. Overall, we show a robust developmental trajectory of reinforcement motor learning abilities under ecologically relevant conditions i.e., continuous movement options mapped to probabilistic reward. This learning appears to be limited by immature spatial processing and probabilistic reasoning abilities in young children and can be rescued by reducing the demands in these domains.

No trackpad data for this age group.
Supp.Fig. 2. Example baseline paths for participants ages 3 to 11 years old.Each trajectory begins at (0,0) and ends at Y = 24 when the penguin crosses the back edge of the ice.The final X position of each trajectory corresponds to the interpolated final position of the movement (see Methods for additional details).As available, a sample for each age bin from each input device type is provided.Note that trajectories tend to be straighter for touchscreen input compared to other devices.The twenty squares represent the target centers.Note that the full reward zone is not shown due to overlap between targets.Unrewarded and reward paths are shown as dashed and solid lines, respectively.We performed a regression analysis on the change in absolute reach direction (∆Xt = Xt − X t−1 ) as a function of whether the last three trials were successes or failures.That is, we fit ∆Xt = w 0 + w , where ft is 1 if trial t was a failure and 0 for success.Note that w 0 reflects the change in reach when there were no failures in the previous three trials.This decreased with age and may represent decreasing sensorimotor noise (w 0 : 100 out of 111 participants significant at p=0.05; correlation with age R 2 = 0.122, F = 15.2, p < 0.001).w 1 , w 2 and w 3 reflect the contribution of failing on the previous trial, two and three trials ago, respectively to the change in reach.The change in reach after one failure increased with age (w 1 : n = 66 out of 111 participants significant; correlation with age R 2 = 0.139, F = 17.6, p < 0.001).The effect of failure for two and three trials back were mostly not significant (w 2 : 20 out of 111 participants significant; correlation with age R 2 = 0.0166, F = 1.83, p = 0.178; w 3 : 17 out of 111 participants significant; correlation with age R 2 = 0.119, F = 14.7, p < 0.001).For the adults, the average of all of the data points is plotted as a horizontal line.
. 3. Example baseline paths for participants for participants ages 12 to 17 years old and adults.Same format as Supp.Fig. 2. Path length ratios.The path length ratio is a measure of path curvature (path length divided by distance from first to last point of movement) for the four tasks.Significant pairwise comparisons between age bins indicated above plots as follow: * = p < 0.05, + = p < 0.01, and ∆ = p < 0.001.Bars show mean and standard error of the mean.Timing information.Reaction time (time from when penguin appeared until the participant clicked on the penguin to start the trial), stationary time (time from click to start of movement), movement time (time from start to end of movement) and game time (time to complete the whole task in minutes) for the 4 tasks split by age bins.Significant pairwise comparisons between age bins indicated above plots as follow: * p < 0.05, + p < 0.01, ∆ p < 0.001.Bars show mean and standard error of the mean.History dependence of change in reach as a function of reward history for the continuous probabilistic task.
Model parameter recovery.The recovered vs. true parameters for synthetic data generated by the model (for 100 learning trials) and then fit.Correlations are shown above the plots.Comparison of significant models.Some participants in each age bin were best fit with the noise only model (left) compared to the full model (right).When removing the participants who were best fit with the noise only model, the same age related trends in learning remained as depicted in Fig.3.Children age 3 to 8 years old show poor learning compared to older participants.
Adult and 3 to 8-year-old children performance on all four tasks.Panels in same format as Fig.6a.

Mediation analysis for continuous probabilistic task.
Results of the effect of age on learning mediated by baseline variability and variability after failure.Baseline variability and variability after failure together partially mediate the effect of age on learning.Significant effects are in bold.β: regression coefficient, SE: standard error Supp.Tab. 2. Mediation analysis for discrete probabilistic task.Results of the effect of age on learning mediated by variability after success and after failure.Together, variability after success and after failure completely mediate the effect of age on learning.Significant effects are in bold.β: regression coefficient, SE: standard error